{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lunar Lander using A2C\n",
    "\n",
    "Let's learn how to implement A2C with stable baselines for the lunar landing task. In the\n",
    "lunar lander environment, our agent drives the space vehicle and the goal of the agent is to\n",
    "land correctly on the landing pad. If our agent (lander) lands away from the landing pad,\n",
    "then it loses the reward and the episode will get terminated if the agent crashes or comes to\n",
    "rest. The action space of the environment includes four discrete actions which are do\n",
    "nothing, a fire left orientation engine, fire main engine, and fire right orientation engine.\n",
    "Now, Let's see how to train the agent using A2C to correctly land on the landing pad.\n",
    "\n",
    "First, let's import the necessary libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "import gym\n",
    "from stable_baselines.common.policies import MlpPolicy\n",
    "from stable_baselines.common.vec_env import DummyVecEnv\n",
    "from stable_baselines.common.evaluation import evaluate_policy\n",
    "from stable_baselines import A2C\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create the lunar lander environment using gym:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "env = gym.make('LunarLander-v2')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use the dummy vectorized environment, we learned that in the dummy vectorized\n",
    "environment, we run each environment in the same process:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "env = DummyVecEnv([lambda: env])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = A2C(MlpPolicy, env, ent_coef=0.1, verbose=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<stable_baselines.a2c.a2c.A2C at 0x7fbb000ca518>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent.learn(total_timesteps=25000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training, we can evaluate our agent by looking at the mean rewards:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "mean_reward, n_steps = evaluate_policy(agent, agent.get_env(),\n",
    "n_eval_episodes=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also have a look at how our trained agent performs in the environment:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "state = env.reset()\n",
    "while True:\n",
    "    action, _states = agent.predict(state)\n",
    "    next_state, reward, done, info = env.step(action)\n",
    "    state = next_state\n",
    "    env.render()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
